Rarity of Somatic Mutation and Frequency of Normal Sequence Variation Detected in Sporadic Colon Adenocarcinoma Using High-Throughput cDNA Sequencing

We performed high-throughput cDNA sequencing in colorectal adenocarcinoma and matching normal colorectal epithelium. All six hundred three genes in the UCSC database that were expressed in colon cancers and contained open reading frames of 1000 nucleotides or less were selected for study (total basepairs/bp, 366,686). 304,350 of these 366,686 bp (83.0%) were amplified and sequenced successfully. Seventy-eight sequence variants present in germline (i.e. normal) as well as matching somatic (i.e. tumor) DNA were discovered, yielding a frequency of 1 variant per 3,902 bp. Fifty-one of these sequence variants were homozygous (26 synonymous, 25 non-synonymous), while 27 were heterozygous (11 synonymous, 16 non-synonymous). Cancer tissue contained only one sequence-altered allele of the gene ATP50, which was present heterozygously alongside the wild-type allele in matching normal epithelium. Despite this relatively large number of bp and genes sequenced, no somatic mutations unique to tumor were found. High-throughput cDNA sequencing is a practical approach for detecting novel sequence variations and alterations in human tumors, such as those of the colon.


Introduction
It is widely believed that somatic as well as germline mutations play important roles in the origin and progression of colorectal cancers (Calvert and Frucht, 2002). Many genes have been investigated for mutation to elucidate mechanisms of colorectal cancer development, with these investigations demonstrating the involvement of mutations in colorectal carcinogenesis and progression. Samuels et al. reported that PIK3CA, a catalytic subunit of the class IA phosphatidylinositol 3kinases, was somatically mutated in 32% of colorectal cancers, resulting in the attenuation of apoptosis and facilitated tumor invasion (Samuels et al. 2004). A comprehensive study entitled, "The Tyrosine Phosphatome" was accomplished by sequencing all genes involved in tyrosine phosphorylation in a large cancer cohort consisting of 175 colorectal cancer patients . Most mutational studies, however, have been preoccupied with the prevalence of somatic mutations in a specific single candidate gene in relatively small colorectal cancer patient cohorts. Recently, Sjoblom et al. reported the genome-wide frequencies of somatically mutated genes in human breast and colorectal cancers (Sjoblom et al. 2006). However, the methods these used were extremely expensive, time-consuming, and labor-intensive for a typical laboratory to perform. More practical strategies, amenable to smaller laboratories with more conservative budgets, would be of great value in the continuing quest to answer questions in the fields of tumor genomics and mutatomics. To this end, we present herein a circumscribed, practical mutational study employing high-throughput cDNA sequencing in colon adenocarcinoma, in which we demonstrate the eminent feasibility and results of determining sequence variation efficiently and at low cost.

Tissue samples
Colorectal cancer and its matching normal colonic mucosa from a patient undergoing surgical resection at the Baltimore VA Hospital after signing informed research consent was used for this study. Clinicopathological data were as follows: 75 yearold male; moderately-differentiated colorectal adenocarcinoma of the ascending colon; tumor size, 2.5 × 1.1 × 0.5 cm; TNM stage (Fifth Edition of the TNM classifi cation of the UICC, 1997), T2N0MX, without any other malignancies. Both colorectal adenocarcinoma and normal colonic epithelium (obtained at the location within the surgically resected specimen furthest from the tumor) were cut into smaller pieces and frozen in liquid nitrogen immediately after removal. A frozen aliquot of each specimen was crushed into pieces and lysed immediately in either TRIZOL reagent (Invitrogen Corp., Carlsbad, CA,) to extract total RNA, or lysis buffer of a DNeasy Tissue kit (QIAGEN Inc., Valencia, CA) to extract DNA, according to these manufacturers' instructions.

Cell lines
HeLa S3, HT29, HCT15, HCT116, LoVo, CaCo2, LS174T, LS411N, and DLD1, purchased from the American Type Culture Collection (ATCC), and KYSE30,70,110,150,220,410,770,850 and OE33, obtained from Dr. Yutaka Shimada at Kyoto University in Japan (Shimada et al. 1992), were enrolled in the current study in order to validate our fi ndings in the ATP50 gene. Culture conditions for each cell line were according to ATCC and the establisher's recommendations. All cell lines were supplemented with 10% fetal bovine serum plus an appropriate concentration of penicillin and streptomycin.

Gene selection
To increase our chances of successfully amplifying and sequencing cDNAs, we restricted our study to genes that are known to be expressed in colorectal cancer cells, based on a gene expression database at the University of California, Santa Cruz (UCSC) [http://genome.ucsc.edu/index.html]. From among this gene set, we selected a subset of genes (approximately 600) containing open reading frames (ORFs) 1000 nucleotides or shorter in length. To automate design of the large number of primer sets required, we developed an in-house primer design algorithm based on the publicly available primer design software program, Primer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_ www_slow.cgi). PCR products were designed to range from 300 to 500 bp in length. ORFs of cDNAs longer than 500 bp were divided into 2 or 3 fragments; primers were then designed with adjacent fragments overlapped, in order to completely cover these longer ORFs. Finally, for each heterozygous sequence alteration, genomic DNA primers (available on request) were designed to confi rm cDNA sequencing results.

RT-PCR
Total RNA extracted from colorectal adenocarcinoma and normal colonic epithelium was reversetranscribed using a SuperScript III First-Strand kit (Invitrogen, Carlsbad, CA), and respective cDNA pools were made. RT-PCR was performed using an AccuPrime Supermix I Kit (Invitrogen). The PCR protocol was as follows: 1 min at 96 °C followed by 35 cycles of 30 sec at 94 °C, 45 sec at 58 °C, and 1 min at 72 °C. Secondary PCR was performed on purifi ed template from the fi rst RT-PCR product, using the same protocol.

Sequencing
A BigDye Terminator v3.1 Kit (Applied Biosystems, Foster City, CA) was used for the sequencing reaction, and sequence products were read on an SCE 9610 automated 96-capillary sequencer (Spectruby BaseSpectrum v2.10 (SpectruMedix) and analyzed with Mutation Surveyor v2.2 (SoftGenetics LLC, State College, PA). Each time candidate sequence alterations were discovered in cDNA from colorectal cancer tissue, identical procedures were followed in matched normal epitheliam to confirm whether or not they represented somatic alterations. After candidate alterations were confirmed, the entire procedure was repeated separately on a fresh aliquot of cDNA from both the cancer and normal specimens in order to exclude amplification or technical errors due to two-stage PCR. Genomic DNA sequencing was also performed on heterozygous sequence variants to confirm that identical sequence alterations were present in genomic DNA.

Methylation-specifi c PCR (MSP)
Because the gene ATP50 was apparently mutated, raising the possibility that it was a tumor suppressor gene, we evaluated this gene for alternative inactivation via promoter hypermethylation. MSP primer sequences of ATP50 for the methylated reaction were: forward (5′-CGAGTGGGAGC-GATTTAGGAC-3′) and reverse (5′-AACGC-CAAAATTACGACACG-3′), which amplify a 94-bp product. Β-actin was selected as an internal control gene, using previously published MSP primers (Eads et al. 2001). CpGenome Universal Methylated DNA (Chemicon International, Inc., Temecula, CA) was used as a positive control. The detailed MSP procedure has been previously published (Sato et al. 2002).

Microsatellite instability (MSI) assay
MSI at each locus was determined by analyses of the length of each PCR-amplifi ed microsatellite. MSI status was confi rmed by MSI assays at fi ve consensus loci (BAT25, BAT26, D2S123, D5S346, and D17S250) according to criteria from a National Cancer Institute workshop (Boland et al. 1998). Detailed procedures were as previously described (Mori et al. 2001).

Project overview
A total of 603 genes (S- Table 1) were selected based on their length (under 1,000 bp) and their predicted expression in colorectal cancers according to the UCSC database. One thousand thirty-eight primer pairs (available on request) were designed to cover the entire ORFs (total bp, 366,687) of these 603 genes. Sequence data from 862 (83.0%) of these 1,038 primer sets were successfully analyzed, meaning that approximately 304,350 total bp were successfully sequenced (all primer sets for RT-PCR and cDNA sequencing are available on request).

Sequence variants
Seventy-eight sequence variants within 50 genes were found among the 603 genes studied (Table 1) (S- Table 2 for detailed information). Thus, the frequency of sequence variants was 1 per 3,902 bp (78 total variants/304,350 total bp). Of these 78 sequence alterations, 51 were homozygous (26 synonymous, 25 non-synonymous) and 27 were heterozygous (11 synonymous, 16 Figure 1. cDNA sequencing of ATP50. Two different alteration sites were detected. At the 108th nucleotide, colorectal cancer tissue had only a mutant cytosine nucleotide, while normal colon contained both a thymine (wild) and a cytosine (mutant). Both codons GGT and GGC encoded glycine (synonymous alteration). At the 218th mucleotide, colorectal cancer tissue had only a mutant guanine nucleotide, while normal tissue contained both an adenine (wild) and a guanine (mutant). AAA encoded lysine and AGA encoded arginine (non-synonymous alteration). Gly, glycine; Arg, arginine. non-synonymous). All sequence alterations were detected in both colorectal cancer tissue and matched normal colonic epithelium, with the exception of an alteration in the gene ATP50 (NM_001697), which manifested a unique expression mechanism (Fig. 1). Forty-four sequence alterations had been previously reported, but 34 sequence alterations were completely novel, having never been reported in the SNP database at The National Center for Biotechnology Information (NCBI).

Tumor-specifi c regulation of gene expression
Tumor-specifi c regulation of gene expression was found for NM_001697 (ATP50, Homo sapiens ATP synthase, H+ transporting, mitochondrial F1 complex, O subunit). The sequence alterations T108C (GGT to GGC, homozygous, Gly36Gly) and A218G (AAA to AAG, homozygous, Lys73Arg) were observed only in cancer-derived cDNA, while the alterations T108TC (CGT and GGC, heterozygous, 36Gly) and A218AG (AAA and AAG, heterozygous, 73Lys and 73 Arg) were observed in cDNA from normal epithelium. Surprisingly, both T108TC (CGT and GGC, heterozygous, 36Gly) and A218AG (AAA and AAG, heterozygous, 73Lys and 73 Arg), which were identical to the two alterations observed in normal cDNA, were observed in genomic DNA from both cancer and normal tissue (Figures 2, 3). This result implied that the cancer exhibited monoallelic expression from the variant allele of ATP50, while the normal epithelium manifested biallelic heterozygous expression, i.e. from both the reported normal allele and our discovered variant mutant allele simultaneously.

MSP
One possible mechanism for monoallelic expression observed for ATP50 was DNA methylation of its promoter region. MSP showed, however, that there was no methylation of the ATP50 promoter in colorectal cancer (S- Fig. 1).

Somatic mutations
There were no somatic mutations found among the 603 genes studied or within the p53 gene.

MSI status
MSI assays showed that there was no microsatellite instability in genomic DNA (S- Fig. 2).

Discussion
In the current study, we assumed that if a mutant protein was involved in carcinogenesis or tumor progression, this mutant would be expressed and therefore detectable in tumor mRNA. i.e. we assumed that somatic mutations involved in carcinogenesis or tumor progression would be detectable by direct cDNA sequencing. By using this strategy, we avoided the need for sequencing each exon of genomic DNA, reasoning that genes which are never expressed in normal or malignant colon probably do not participate in colorectal carcinogenesis. We discovered 78 sequence variants (44 of which had been previously reported as single-nucleotide polymorphisms, but 34 of which had never been reported) among the 603 genes (304,350 bp of ORFs) studied.
Recently, Sjoblom T. et al. performed genomewide sequencing in breast and colorectal cancers, revealing that an average of 52 mutations occurred in each colorectal cancer (Sjoblom et al. 2006). According to the article by Sjoblom et al. the somatic mutation frequency in colon cancers was 3.2 somatic mutations/Mb, on average (Table 1 of their paper). Therefore, the probability of our fi nding zero somatic mutations among the 603 genes (304,350 bp) that we studied was 37.76% (please see formula below), suggesting that our fi ndings were statistically quite consistent with Sjoblom's results:  The Sjoblom team also defi ned "CAN-genes" (candidate cancer genes) as those that were frequently mutated in colorectal cancers, and found that 69 genes could be included in this category. Although the CAN-genes KRAS, GNAS and TP53 were studied by us, no somatic mutations were found in these genes. Furthermore, in addition to the genes mentioned above, NRAS, HRAS, p16, and p27 were included in the current study, but these genes also contained no somatic mutations. Finally, results of MSI assays revealed MS-stability (MSS), implying an absence of mutations in the major DNA mismatch repair genes (although these genes were not studied due to their long ORFs). It is possible that other molecular pathogenetic pathways were involved in this colorectal tumorigenesis, such as those containing APC, MCC, DCC, or the TGF-β cascade: these genes were also not examined in the current study due to ORF length.
Approximately 24,000,000 bp among the entire genomic DNA sequence are reported as ORFs in the UCSC database. The average density of each SNP is once per 1.9 kilobases (i.e. 1,419,190 SNPs/2.7 gigabases of human genome sequence) (Sachidanandam et al. 2001). We sequenced 304,350 bp of ORFs (viz., 1.26% of the total ORFs in the UCSC database: 304,350 bp/24,000,000 bp) and discovered 78 sequence variants, yielding a frequency of 1 alteration per 3,902 bp (78/304,350 bp). Our observed sequence variant distribution may provide a basis with which to estimate the number of SNPs in a single individual with colon cancer. That is, the SNPs reported above are one possible subset of the entire database; there is no guarantee that a given individual will always harbor all SNPs in the database.
The human ATP50 gene (X83218, NM_ 001697), encoding a 213-amino acid ATP synthase OSCP subunit, is a key structural component of the stalk of the mitochondrial respiratory chain F 1 F 0 -ATP synthase, which is a vital element in the cellular pathway of energy conversion (Senior, 1988). Although a mutant strain of yeast in which the delta subunit of F 1 F 0 -ATP synthase had been inactivated by insertional mutagenesis showed little or no ATPase activity (Giraud and Velours, 1994), and dysfunction of ATP synthase can cause a variety of degenerative diseases (Wallace, 1994), there have been no previous reports detailing a relationship between ATP synthase and tumorigenesis. We found restricted monoallelic (i.e. monoallelically silenced) expression of an altered allele from ATP50 in our colon cancer tissue, which would be expected to exert the same effect as would a somatic mutation of this gene. Genomic DNA sequencing of ATP50 revealed that this monoallelic expression was not due to LOH. We therefore studied the methylation status of the CpG island in the promoter region of ATP50 by MSP, but we found no methylation of this region. Other epigenetic mechanisms, such as histone deacetylation, might have contributed to monoallelic expression of ATP50. There was no monoallelic expression of ATP50 in 20 cancer cell lines that we examined. Although monoallelic expression of this altered ATP50 allele may be involved in a subset of colorectal cancers, further study is required to clarify the potential functional role of this gene in carcinogenesis.
This study poses several advantages as well as limitations. Firstly, it has been reported that some synonymous mutations may influence the stability of mRNA (Duan and Antezana, 2003;Chamary and Hurst, 2005) because they affect the thermodynamic stability of mRNA secondary structures (Fitch, 1974;Klambt, 1975). Nonsensemediated mRNA decay (NMD) is also known as a surveillance pathway that rapidly degrades mRNAs containing premature termination codons (Culbertson and Leeds, 2003;Amrani et al. 2006). These mechanisms may cause instability of mRNA, accelerate the degradation of mRNA, and consequently result in difficulty in detecting sequence alterations by cDNA sequencing. Since we used cDNA as our starting material for sequencing, we may have ignored some key genes because of RNA degradation. Nevertheless, many sequence variants were detected reasonably well in the current study, suggesting that degradation of mRNA occurred rarely, if at all, as a consequence of sequence alterations. Instead, we considered it more important to increase our chances of finding sequence alterations by using cDNA rather than genomic DNA because of the lower cost, time, and labor involved in sequencing cDNA, as well as the increased relevance of only studying genes that are expressed in the colon.
Secondly, it is conceivable that we lost some gene sequence information due to extremely low expression levels. Therefore, we employed twostage PCR to increase our chances of successful sequencing, thereby achieving a relatively high success rate of 862/1,038 reactions, or 83.0%. Possibly, this result still may have included genes that were not expressed in our particular colorectal cancer, even though we used the UCSC database to select genes that were purportedly expressed in colorectal cancers. Our sequencing success rate appears favorable when compared to genomic DNA sequencing, where 92% of genes were successfully analyzed ). The total number of exons sequenced in our study was 2107, implying that at least 2107 primer pairs would have been necessary to conduct this study had it been attempted by genomic DNA sequencing; in contrast, we accomplished this task using only 1038 primer sets for cDNA sequencing. This contrast demonstrates that our method is useful to explore mutations because it is not only more costeffective, but also less demanding in time and labor.